A robust polynomial regression-based voice activity detector for speaker verification
نویسندگان
چکیده
Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant regions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based algorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The proposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset regions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm fit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The kmeans algorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo SNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy centers. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the deteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not only for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the proposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker verification experiments, where five different noise types at five different SNR levels were considered. The proposed algorithm showed superior verification performance both with the conventional GMM-UBM method, and the stateof-the-art i-vector method.
منابع مشابه
Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments
We describe a voice activity detection algorithm which leads to significant improvement of a narrow-bandwidth speaker verification system under harsh environments. This algorithm is based on a time-scale feature which is extracted from wavelet subbands. A statistical quantile filtering technique is proposed to estimate an adaptive noise threshold. A hang-over scheme is then applied to bridge sh...
متن کاملA Novel Voice Activity Detection Approach for Automatic Speaker Verification
The main goal of this paper is to propose a novel voice activity detector (VAD) approach for Speaker Verification applications which is both faster and robust to noise environment. The idea is to use Support Vector Machine (SVM) parameters estimated using speech and nonspeech of a long utterance to compute a function decision of the proposed VAD approach. Speaker Verification results on TIMIT d...
متن کاملThe following publication :
In this communication we first review the human speech production process and feature extraction approaches commonly used in a speaker verification system. Mel Frequency Cepstral Coefficients (MFCCs), delta (regression) features and Cepstral Mean Subtraction (CMS) are covered. A recently proposed feature set, termed Maximum Auto-Correlation Values (MACVs), which utilizes information from the so...
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملMultimodal Speaker Verification Based on Electroglottograph Signal and Glottal Activity Detection
To achieve robust speaker verification, we propose a multimodal method which includes additional nonaudio features and glottal activity detector. As a nonaudio sensor an electroglottograph (EGG) is applied. Parameters of EGG signal are used to augment conventional audio feature vector. Algorithm for EGG parameterization is based on the shape of the idealized waveform and glottal activity detect...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- EURASIP J. Audio, Speech and Music Processing
دوره 2017 شماره
صفحات -
تاریخ انتشار 2017